AITopics | bellman update

Collaborating Authors

bellman update

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsMar-21-2026, 08:35:06 GMT

In offline reinforcement learning, a policy is learned using a static dataset in the absence of costly feedback from the environment. In contrast to the online setting, only using static datasets poses additional challenges, such as policies generating out-of-distribution samples. Model-based offline reinforcement learning methods try to overcome these by learning a model of the underlying dynamics of the environment and using it to guide policy search. It is beneficial but, with limited datasets, errors in the model and the issue of value overestimation among out-of-distribution states can worsen performance. Current model-based methods apply some notion of conservatism to the Bellman update, often implemented using uncertainty estimation derived from model ensembles. In this paper, we propose Constrained Latent Action Policies (C-LAP) which learns a generative model of the joint distribution of observations and actions. We cast policy learning as a constrained objective to always stay within the support of the latent action distribution, and use the generative capabilities of the model to impose an implicit constraint on the generated actions. Thereby eliminating the need to use additional uncertainty penalties on the Bellman update and significantly decreasing the number of gradient steps required to learn a policy. We empirically evaluate C-LAP on the D4RL and V-D4RL benchmark, and show that C-LAP is competitive to state-of-the-art methods, especially outperforming on datasets with visual observations.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Fast Bellman Updates for Wasserstein Distributionally Robust MDPs

Neural Information Processing SystemsFeb-19-2026, 17:09:55 GMT

Markov decision processes (MDPs) often suffer from the sensitivity issue under model ambiguity. In recent years, robust MDPs have emerged as an effective framework to overcome this challenge. Distributionally robust MDPs extend the robust MDP framework by incorporating distributional information of the uncertain model parameters to alleviate the conservative nature of robust MDPs.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Appendix: Proofs and Algorithms A.1 Proofs of results in Section 4 Proof of Proposition 4.1. Plug B

Neural Information Processing SystemsFeb-12-2026, 15:50:08 GMT

(Bertsekas, 1999). Algorithm 1. Furthermore, we call ˆ f (), X We can show that | f () ˆ f () |, 8 2 [, ] . Besides, computing the upper bound claimed in Proposition 4.2 requires finding The second equality is from the fact that the objective function is affine w.r.t. Finally, we verify the rest two components. Finally, we verify the rest two components. This finishes the proof of our claim.

artificial intelligence, optimization problem, sa 2, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.31)

Add feedback

d2fe3a5711a6d488da9e9a78b84ee24c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 02:37:20 GMT

algorithm, ambiguity, projection problem, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > New Hampshire (0.04)
North America > United States > Massachusetts (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Energy (0.68)
Transportation (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

931af583573227f0220bc568c65ce104-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 23:15:26 GMT

international conference, proceedings, remert, (15 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(6 more...)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploiting the Replay Memory Before Exploring the Environment: Enhancing Reinforcement Learning Through Empirical MDP Iteration

Neural Information Processing SystemsNov-19-2025, 23:03:16 GMT

Reinforcement learning (RL) algorithms are typically based on optimizing a Markov Decision Process (MDP) using the optimal Bellman equation.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search

Hadar, Gal, Agostinelli, Forest, Shperberg, Shahaf S.

arXiv.org Artificial IntelligenceNov-14-2025

Many sequential decision-making problems can be formulated as shortest-path problems, where the objective is to reach a goal state from a given starting state. Heuristic search is a standard approach for solving such problems, relying on a heuristic function to estimate the cost to the goal from any given state. Recent approaches leverage reinforcement learning to learn heuristics by applying deep approximate value iteration. These methods typically rely on single-step Bellman updates, where the heuristic of a state is updated based on its best neighbor and the corresponding edge cost. This work proposes a generalized approach that enhances both state sampling and heuristic updates by performing limited-horizon searches and updating each state's heuristic based on the shortest path to the search frontier, incorporating both edge costs and the heuristic values of frontier states.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2511.10264

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Burden of Interactive Alignment with Inconsistent Preferences

Shirali, Ali

arXiv.org Artificial IntelligenceOct-21-2025

From media platforms to chatbots, algorithms shape how people interact, learn, and discover information. Such interactions between users and an algorithm often unfold over multiple steps, during which strategic users can guide the algorithm to better align with their true interests by selectively engaging with content. However, users frequently exhibit inconsistent preferences: they may spend considerable time on content that offers little long-term value, inadvertently signaling that such content is desirable. Focusing on the user side, this raises a key question: what does it take for such users to align the algorithm with their true interests? To investigate these dynamics, we model the user's decision process as split between a rational system 2 that decides whether to engage and an impulsive system 1 that determines how long engagement lasts. We then study a multi-leader, single-follower extensive Stackelberg game, where users, specifically system 2, lead by committing to engagement strategies and the algorithm best-responds based on observed interactions. We define the burden of alignment as the minimum horizon over which users must optimize to effectively steer the algorithm. We show that a critical horizon exists: users who are sufficiently foresighted can achieve alignment, while those who are not are instead aligned to the algorithm's objective. This critical horizon can be long, imposing a substantial burden. However, even a small, costly signal (e.g., an extra click) can significantly reduce it. Overall, our framework explains how users with inconsistent preferences can align an engagement-driven algorithm with their interests in a Stackelberg equilibrium, highlighting both the challenges and potential remedies for achieving alignment.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.16368

Genre: Research Report > Experimental Study (1.00)

Technology: